智能论文笔记

Causality Detection using Multiple Annotation Decisions

Quynh Anh Nguyen , Arka Mitra

分类：自然语言处理

2022-10-26

The paper describes the work that has been submitted to the 5th workshop on Challenges and Applications of Automated Extraction of socio-political events from text (CASE 2022). The work is associated with Subtask 1 of Shared Task 3 that aims to detect causality in protest news corpus. The authors used different large language models with customized cross-entropy loss functions that exploit annotation information. The experiments showed that bert-based-uncased with refined cross-entropy outperformed the others, achieving a F1 score of 0.8501 on the Causal News Corpus dataset.

translated by 谷歌翻译

QC-StyleGAN -- Quality Controllable Image Generation and Manipulation

Dat Viet Thanh Nguyen , Phong Tran The , Tan M. Dinh , Cuong Pham , Anh Tuan Tran

分类：计算机视觉 | 人工智能

2022-12-02

The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.

translated by 谷歌翻译

Machine Learning-based Framework for Optimally Solving the Analytical Inverse Kinematics for Redundant Manipulators

Minh Nhat Vu , Florian Beck , Christian Hartl-Nesic , Anh Nguyen , Andreas Kugi

分类：机器人

2022-11-08

Solving the analytical inverse kinematics (IK) of redundant manipulators in real time is a difficult problem in robotics since its solution for a given target pose is not unique. Moreover, choosing the optimal IK solution with respect to application-specific demands helps to improve the robustness and to increase the success rate when driving the manipulator from its current configuration towards a desired pose. This is necessary, especially in high-dynamic tasks like catching objects in mid-flights. To compute a suitable target configuration in the joint space for a given target pose in the trajectory planning context, various factors such as travel time or manipulability must be considered. However, these factors increase the complexity of the overall problem which impedes real-time implementation. In this paper, a real-time framework to compute the analytical inverse kinematics of a redundant robot is presented. To this end, the analytical IK of the redundant manipulator is parameterized by so-called redundancy parameters, which are combined with a target pose to yield a unique IK solution. Most existing works in the literature either try to approximate the direct mapping from the desired pose of the manipulator to the solution of the IK or cluster the entire workspace to find IK solutions. In contrast, the proposed framework directly learns these redundancy parameters by using a neural network (NN) that provides the optimal IK solution with respect to the manipulability and the closeness to the current robot configuration. Monte Carlo simulations show the effectiveness of the proposed approach which is accurate and real-time capable ($\approx$ \SI{32}{\micro\second}) on the KUKA LBR iiwa 14 R820.

translated by 谷歌翻译

Soft Robotic Link with Controllable Transparency for Vision-based Tactile and Proximity Sensing

Quan Khanh Luu , Dinh Quang Nguyen , Nhan Huu Nguyen , Van Anh Ho

分类：机器人

2022-11-07

Robots have been brought to work close to humans in many scenarios. For coexistence and collaboration, robots should be safe and pleasant for humans to interact with. To this end, the robots could be both physically soft with multimodal sensing/perception, so that the robots could have better awareness of the surrounding environment, as well as to respond properly to humans' action/intention. This paper introduces a novel soft robotic link, named ProTac, that possesses multiple sensing modes: tactile and proximity sensing, based on computer vision and a functional material. These modalities come from a layered structure of a soft transparent silicon skin, a polymer dispersed liquid crystal (PDLC) film, and reflective markers. Here, the PDLC film can switch actively between the opaque and the transparent state, from which the tactile sensing and proximity sensing can be obtained by using cameras solely built inside the ProTac link. In this paper, inference algorithms for tactile proximity perception are introduced. Evaluation results of two sensing modalities demonstrated that, with a simple activation strategy, ProTac link could effectively perceive useful information from both approaching and in-contact obstacles. The proposed sensing device is expected to bring in ultimate solutions for design of robots with softness, whole-body and multimodal sensing, and safety control strategies.

translated by 谷歌翻译

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Dang Minh Nguyen , Luu Anh Tuan

分类：自然语言处理 | 机器学习

2022-11-05

Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our method consistently and significantly outperforms previous defenses under various attack settings without trading off clean accuracy. To the best of our knowledge, this is the first NLP defense that leverages the manifold structure against adversarial attacks. Our code is available at \url{https://github.com/dangne/tmd}.

translated by 谷歌翻译

SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation

Zhaorui Tan , Zihan Ye , Qiufeng Wang , Yuyao Yan , Anh Nguyen , Xi Yang , Kaizhu Huang

分类：计算机视觉

2022-10-27

Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric $R$-precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance ($SSD$), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN) that aims at improving text-image consistency by fusing semantic information at different granularities and capturing accurate semantics. Equipped with two novel plug-and-play components: Hard-Negative Sentence Constructor and Semantic Projection, the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments show that, as opposed to current state-of-the-art methods, our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.

translated by 谷歌翻译

Improving Document Image Understanding with Reinforcement Finetuning

Bao-Sinh Nguyen , Dung Tien Le , Hieu M. Vu , Tuan Anh D. Nguyen , Minh-Tien Nguyen , Hung Le

分类：计算机视觉 | 机器学习

2022-09-26

成功的人工智能系统通常需要大量标记的数据来从文档图像中提取信息。在本文中，我们研究了改善人工智能系统在理解文档图像中的性能的问题，尤其是在培训数据受到限制的情况下。我们通过使用加强学习提出一种新颖的填充方法来解决问题。我们的方法将信息提取模型视为策略网络，并使用策略梯度培训来更新模型，以最大程度地提高补充传统跨凝结损失的综合奖励功能。我们使用标签和专家反馈在四个数据集上进行的实验表明，我们的填充机制始终提高最先进的信息提取器的性能，尤其是在小型培训数据制度中。

translated by 谷歌翻译

Uncertainty-aware Label Distribution Learning for Facial Expression Recognition

Nhat Le , Khanh Nguyen , Quang Tran , Erman Tjiputra , Bac Le , Anh Nguyen

分类：计算机视觉

2022-09-21

尽管在过去的几年中取得了重大进展，但歧义仍然是面部表情识别（FER）的关键挑战。它可能导致嘈杂和不一致的注释，这阻碍了现实世界中深度学习模型的性能。在本文中，我们提出了一种新的不确定性标签分布学习方法，以提高深层模型的鲁棒性，以防止不确定性和歧义。我们利用价值空间中的邻里信息来适应培训训练样本的情绪分布。我们还考虑提供的标签将其纳入标签分布时的不确定性。我们的方法可以轻松地集成到深层网络中，以获得更多的培训监督并提高识别准确性。在各种嘈杂和模棱两可的环境下，在几个数据集上进行了密集的实验表明，我们的方法取得了竞争成果，并且超出了最新的最新方法。我们的代码和模型可在https://github.com/minhnhatvt/label-distribution-learning-fer-tf上找到。

translated by 谷歌翻译

Inverse Image Frequency for Long-tailed Image Recognition

Konstantinos Panagiotis Alexandridis , Shan Luo , Anh Nguyen , Jiankang Deng , Stefanos Zafeiriou

分类：计算机视觉

2022-09-11

长尾分布是现实世界中的常见现象。提取的大规模图像数据集不可避免地证明了长尾巴的属性和经过不平衡数据训练的模型可以为代表性过多的类别获得高性能，但为代表性不足的类别而苦苦挣扎，导致偏见的预测和绩效降低。为了应对这一挑战，我们提出了一种名为“逆图像频率”（IIF）的新型偏差方法。 IIF是卷积神经网络分类层中逻辑的乘法边缘调整转换。我们的方法比类似的作品实现了更强的性能，并且对于下游任务（例如长尾实例分割）特别有用，因为它会产生较少的假阳性检测。我们的广泛实验表明，IIF在许多长尾基准的基准（例如Imagenet-lt，cifar-lt，ploce-lt和lvis）上超过了最先进的现状，在Imagenet-lt上，Resnet50和26.2％达到了55.8％的TOP-1准确性LVIS上使用MaskRCNN分割AP。代码可在https://github.com/kostas1515/iif中找到

translated by 谷歌翻译

Self-Supervised Depth Estimation in Laparoscopic Image using 3D Geometric Consistency

Baoru Huang , Jian-Qing Zheng , Anh Nguyen , Chi Xu , Ioannis Gkouzionis , Kunal Vyas , David Tuch , Stamatia Giannarou , Daniel S. Elson

分类：计算机视觉

2022-08-17

深度估计是在机器人手术和腹腔镜成像系统中进行图像引导干预的关键步骤。由于对于腹腔镜图像数据很难获得人均深度地面真相，因此很少将监督深度估计应用于手术应用。作为替代方案，已经引入了仅使用同步的立体图像对来训练深度估计器。但是，最近的工作集中在2D中的左右一致性上，而忽略了现实世界坐标中对象的宝贵固有3D信息，这意味着左右3D几何结构一致性尚未得到充分利用。为了克服这一限制，我们提出了M3Depth，这是一种自我监督的深度估计器，以利用3D几何结构信息隐藏在立体声对中，同时保持单眼推理。该方法还消除了在至少一个立体声图像中通过掩码看不见的边界区域的影响，以增强重叠区域中的左图和右图像之间的对应关系。密集实验表明，我们的方法在公共数据集和新获取的数据集上的以前的自我监督方法都大大优先，这表明在不同的样品和腹腔镜上都有良好的概括。

translated by 谷歌翻译